This dataset consists of 1.5M beer reviews from the website Beeradvocate.com. The set includes various pieces of information about the beer and the reviewers impressions of the taste. I chose to focus my analysis on the ABV, beer name, beer style and the overall impression categories.
Prior to performing any analysis I chose to do some cleaning of the dataset. I noticed there were a large number of beers with only 1 review. I took these as being reviewed erroneously due to a spelling error in the beers name or some other mistake by the user. Prior to removing these from the dataset there was a total of ~57K unique beers with an average of 28 reviews per beer. After removing all beers with a single review the total distinct beers sits at ~38K with an average of 41 reviews per beer. The average ABV across the dataset is 7.1 and the max is 43, that’s one seriously strong beer!
Since I just spoke about ABV’s, lets dive a little deeper into that category of the data. The following is a table displaying the 10 top ABV’s determined by the total number of beers reviewed with that ABV.
Clocking in almost twice as many reviews as number two are beers with an ABV of 5%. Some of the most notable beers in this category are Budweiser, Stella Artois, Heineken and Miller Highlife. These beers are all highly available in many stores and bars, so it’s no wonder they have so many reviews!
While 5% may be the most reviewed, it’s actually the worst in terms of average rating with a score of 3.64. The highest average rating is 9% ABV with a score of 3.96, followed by 7.5% at 3.94, then a three way tie between 7%, 8% and 10% at 3.93.
Another interesting tidbit is that these top 10 ABV’s make up 38.3% of the total dataset!
paste(round(sum(abv.top10) / nrow(beer) * 100, digits = 1), "%", sep = '')
## [1] "38.3%"
Let’s shift focus now and move over to the top 10 beers by name determined by the total number of reviews.
As you can see from the chart below, Sierra Nevada brewing managed to achieve two different beers in the top 10 with their Celebration Ale coming in at number 3 with 3000 reviews and their Pale Ale at number 7 with almost 2600.
All 10 of these beers scored above a 4.0 average in the overall impression category with the lowest being the Arrogant Bastard ale from Stone brewing at 4.1 and the highest being Pliny The Elder from Russian River Brewing at 4.6.
Next up is our top 10 most reviewed beers by style. Our front runner here by over 30K reviews is the American IPA style with ~117K reviews, followed up by the double IPA at ~86K. The average review rating in this category is more diverse than the previous with the highest average being the American Double or Imperial Stout coming in at 4.03, and last place being the Fruit or Vegetable beer getting an average overall impression score of 3.42, a spread of 0.61 compared to 0.5 for the previous category.
Diving into the ABV distributions for the top 10 styles, we can see there are some pretty large differences between the group. Each has a slightly different quantile range and number of outliers.
The American Pale, IPA and Porter all have a short box indicating their inter-quartile range is relatively small. While on the other hand the American Strong Ale and the Imperial Stout both have a tall box, indicating their average range is larger. These are also the only two styles that do not have any outliers below their lower whisker.
Both the Imperial Stout and Imperial IPA have the widest range at over a 30 point difference between their minimum and maximum ABV.